Conservativeness of Untied Auto-Encoders

نویسندگان

  • Daniel Jiwoong Im
  • Mohamed Ishmael Diwan Belghazi
  • Roland Memisevic
چکیده

We discuss necessary and sufficient conditions for an auto-encoder to define a conservative vector field, in which case it is associated with an energy function akin to the unnormalized log-probability of the data. We show that the conditions for conservativeness are more general than for encoder and decoder weights to be the same (“tied weights”), and that they also depend on the form of the hidden unit activation function, but that contractive training criteria, such as denoising, will enforce these conditions locally. Based on these observations, we show how we can use auto-encoders to extract the conservative component of a vector field. Introduction An auto-encoder is a feature learning model that learns to reconstruct its inputs by going though one or more capacityconstrained “bottleneck”-layers. Since it defines a mapping r : R → R, an auto-encoder can also be viewed as dynamical system, that is trained to have fixed points at the data (Seung 1998). Recent renewed interest in the dynamical systems perspective led to a variety of results that help clarify the role of auto-encoders and their relationship to probabilistic models. For example, (Vincent et al. 2008; Swersky et al. 2011) showed that training an auto-encoder to denoise corrupted inputs is closely related to performing score matching (Hyvärinen 2005) in an undirected model. Similarly, (Alain and Bengio 2014) showed that training the model to denoise inputs, or to reconstruct them under a suitable choice of regularization penalty, lets the autoencoder approximate the derivative of the empirical data density. And (Kamyshanska 2013) showed that, regardless of training criterion, any auto-encoder whose weights are tied (decoder-weights are identical to the encoder weights) can be written as the derivative of a scalar “potential-” or energy-function, which in turn can be viewed as unnormalized data log-probability. For sigmoid hidden units the potential function is exactly identical to the free energy of an RBM, which shows that there is tight link between these two types of model. The same is not true for untied auto-encoders, for which it has not been clear whether such an energy function exists. It has also not been clear under which conditions an ∗Authors constributed equally. energy function exists or does not exist, or even how to define it in the case where decoder-weights differ from encoder weights. In this paper, we describe necessary and sufficient conditions for the existence of an energy function and we show that suitable learning criteria will lead to an autoencoder that satisfies these conditions at least locally, near the training data. We verify our results experimentally. We also show how we can use an auto-encoder to extract the conservative part of a vector field. Background We will focus on auto-encoders of the form r(x) = Rh ( Wx + b ) + c (1) where x ∈ R is an observation, R and W are decoder and encoder weights, respectively, b and c are biases, and h(·) is an elementwise hidden activation function. An auto-encoder can be identified with its vector field, r(x) − x, which is the set of vectors pointing from observations to their reconstructions. The vector field is called conservative if it can be written as the gradient of a scalar function F (x), called potential or energy function: r(x)− x = ∇F (x) (2) The energy function corresponds to the unnormalized probability of data. In this case, we can integrate the vector field to find the energy function (Kamyshanska 2013). For an auto-encoder with tied weights and real-valued observations it takes the form F (x) = ∫ h(u)du− 1 2 ‖x− c‖2 + const (3) where u = Wx + b is an auxiliary variable and h(·) can be any elementwise activation function with known antiderivative. For example, the energy function of an autoencoder with sigmoid activation function is identical to the (Gaussian) RBM free energy (Hinton 2010):

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Latent Space of Wasserstein Auto-Encoders

We study the role of latent space dimensionality in Wasserstein auto-encoders (WAEs). Through experimentation on synthetic and real datasets, we argue that random encoders should be preferred over deterministic encoders. We highlight the potential of WAEs for representation learning with promising results on a benchmark disentanglement task.

متن کامل

Latent Dimensionality and Random Encoders

We study the role of latent space dimensionality in Wasserstein auto-encoders (WAEs). Through experimentation on synthetic and real datasets, we argue that random encoders should be preferred over deterministic encoders.

متن کامل

Learning invariant features through local space contraction

We present in this paper a novel approach for training deterministic auto-encoders. We show that by adding a well chosen penalty term to the classical reconstruction cost function, we can achieve results that equal or surpass those attained by other regularized auto-encoders as well as denoising auto-encoders on a range of datasets. This penalty term corresponds to the Frobenius norm of the Jac...

متن کامل

Saturating Auto-Encoders

We introduce a simple new regularizer for auto-encoders whose hidden-unit activation functions contain at least one zero-gradient (saturated) region. This regularizer explicitly encourages activations in the saturated region(s) of the corresponding activation function. We call these Saturating Auto-Encoders (SATAE). We show that the saturation regularizer explicitly limits the SATAE’s ability t...

متن کامل

Generative Adversarial Source Separation

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multilayer perceptron trained with a Wasserstein-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016